High Availability Alerts

Use the information in the following tables to learn about all possible HA alerts in detail that are raised by Fault Management.

HA Service (Non-Redundant)

31050 HA Service (Non-Redundant)
Description Send an alert when the standby node is not up which indicates that the system has no redundancy.
Preconditions

Starting with EFA 3.1.0, a timer task periodically monitors the status of the standby node, and raises an event to the fault management system. The fault management system raises an alert to the user to indicate that the system is not fully redundant.

  • For HA events, the polling frequency is every minute.
Requirements
Alert shows the following data:
  • Node IP
The following example shows an alert when the standby node is down:
<114> 2003-10-11T22:14:15.003Z xco.machine.com FaultManager - -     
   [meta sequenceId=”47”]   
   [origin ip=”10.20.30.40” enterpriseId=”1916” software=”XCO” swVersion=”3.4.0”]   
   [alert@1916   
   resource=”/App/System/HA/Nodes/Node”  
   alertId=”31050”  	   
   cause=”lossOfRedundancy”
   type=”operationalViolation”   
   severity=”minor”]    
    [alertData@1916   
   node_ip=”10.1.2.4”]
   BOMHA degraded, node 10.1.2.4 is down.
Health Response
Response
{
    Resource: /App/System/HA/Nodes/Node
    HQI {
        Color: Orange
        Value: 3
    }
    StatusText: HA degraded, node 10.2.3.5 is down.
}

HA Service (Fully Redundant)

31051 HA Service (Fully Redundant)
Description Send an alert when the standby node is up and ready which indicates that the system is fully redundant.
Preconditions

A timer task periodically monitors the status of the nodes and raises an event to the fault management system. The fault management system raises an alert to the user to indicate that the system is fully redundant.

  • For HA events, the polling frequency is every minute.
Requirements
Alert shows the following data:
  • None
The following example shows an alert when the standby node is up and running:
<118>1 2003-10-11T22:14:15.003Z xco.machine.com FaultManager - -     
   [meta sequenceId=”47”]   
   [origin ip=”10.20.30.40” enterpriseId=”1916” software=”XCO” swVersion=”3.4.0”]   
   [alert@1916   
   resource=”/App/System/HA/Nodes/Node”  
   alertId=”31051”  	   
   cause=”redundancyRestored”
   type=”operationalViolation”   
   severity=”info”]    
   BOMHA fully redundant
Health Response
Response
{
    Resource: /App/System/HA/Nodes/Node
    HQI {
        Color: Green
        Value: 0
    }
    StatusText: HA fully redundant.
}

HA Service (Failover Occurred)

31052 HA Service (Failover Occurred)
Description Send an alert when an HA failover has occurred.
Preconditions

A timer task periodically monitors the status of the nodes and raises an event to the fault management system. The fault management raises an alert to the user to indicate that an HA failover has occurred.

  • For HA events, polling frequency is every minute.
Requirements
Alert shows the following data:
  • Active IP
The following example shows an alert when there is a HA failure:
<114>1 2003-10-11T22:14:15.003Z xco.machine.com FaultManager - -     
   [meta sequenceId=”47”]   
   [origin ip=”10.20.30.40” enterpriseId=”1916” software=”XCO” swVersion=”3.4.0”]   
   [alert@1916   
   resource=”/App/System/HA/Nodes/Node”  
   alertId=”31052”  	   
   cause=”localNodeTransmissionError”
   type=”operationalViolation”   
   severity=”major”]
   [alertData@1916   
   active_iP=”10.1.2.3”]  
   BOM10.1.2.3 is now the HA active node
Health Response
Response
{
    Resource: /App/System/HA/Nodes/Node
    HQI {
        Color: Red
        Value: 4
    }
    StatusText: 10.1.2.3 is now the HA active ndoe.
}

Service Degraded

31053 Service Degraded
Description Send an alert when some of the node services are not operational.
Preconditions A timer task periodically monitors the node status and raises an event to the fault management system. The fault management system raises an alert to the user to indicate that some of the node services are not running.
  • For service events, the polling frequency is every minute.
Requirements

Alert shows the following data:

  • None
None
The following example shows an alert when some of the node services are not running:
<116>1 2003-10-11T22:14:15.003Z xco.machine.com FaultManager - -     
   [meta sequenceId=”47”]   
   [origin ip=”10.20.30.40” enterpriseId=”1916” software=”XCO” swVersion=”3.4.0”]   
   [alert@1916   
   resource=”/App/System/HA/Nodes/Services”  
   alertId=”31053”  	   
   cause=”serviceDegraded”
   type=”operationalViolation”
   severity=”warning”]
   BOMSome of the services are not operational.
Health Response Response
{
    Resource: /App/System/HA/Nodes/Services
    HQI {
        Color: Yellow
        Value: 2
    }
    StatusText: Some of the services are not operational.
}

Service Restored

31054 Service Restored
Description Send an alert when all the node services are operational.
Preconditions A timer task raises an event to the fault management system. The fault management system raises an alert to indicate to the user that some of the node services are running.
  • For service events, the polling frequency is every minute.
Requirements

Alert shows the following data:

  • None
The following example shows an alert when all the node services are running:
<118>1 2003-10-11T22:14:15.003Z xco.machine.com FaultManager - -
   [meta sequenceId=”47”]   
   [origin ip=”10.20.30.40” enterpriseId=”1916” software=”XCO” swVersion=”3.4.0”]   
   [alert@1916   
   resource=”/App/System/HA/Nodes/Services”  
   alertId=”31054”  	   
   cause=”serviceRestored”
   type=”operationalViolation”
   severity=”info”]
   BOMServices are in running state.
Health Response Response
{
    Resource: /App/System/HA/Nodes/Services
    HQI {
        Color: Green
        Value: 0
    }
    StatusText: Services are in running state.
}